In this report we analyse origin-destination flux matrices based on the furthest extent definition. We fit each model to the total BBC mobility data set (\(\Omega^{T}\)) and three stratifications by employment status (\(\Omega^{U}\),\(\Omega^{Ed}\), \(\Omega^{Em}\), \(\Omega^{N}\)), age of user (\(\Omega^{U}\),\(\Omega^{18-30}\), \(\Omega^{30-60}\), \(\Omega^{60-100}\)) and member nation of the UK (\(\Omega^{E}\),\(\Omega^{W}\), \(\Omega^{S}\), \(\Omega^{NI}\)).
We compare the estimated mobility models to estimates from the 2011 census commuting flow data for England (\(\Omega^{CE}\)), Wales (\(\Omega^{CW}\)), Scotland (\(\Omega^{CS}\)) and Northern Ireland (\(\Omega^{CNI}\)).
We estimate posterior distributions for each model using hamiltonian MCMC (as implemented by the Stan package http://mc-stan.org/). To assess model fit and provide a basis for model selection we use approximate leave-one-out cross validation as implemented in the loo package (doi:10.1007/s11222-016-9696-4).
The per capita probability of moving to a different LAD each day varies by category:
## # A tibble: 5 x 5
## employment_cat N movers p_move cat_prop
## <chr> <int> <int> <dbl> <dbl>
## 1 Under 18 2955 914 0.309 0.0683
## 2 Education 3511 1390 0.396 0.0811
## 3 Employed 30500 17227 0.565 0.705
## 4 NEET 6325 1998 0.316 0.146
## 5 Total 43291 21529 0.497 NA
## # A tibble: 4 x 5
## age_cat N movers p_move cat_prop
## <chr> <int> <int> <dbl> <dbl>
## 1 Under 18 2955 914 0.309 0.0683
## 2 18-30 9611 4859 0.506 0.222
## 3 30-60 26009 14015 0.539 0.601
## 4 60-100 4716 1741 0.369 0.109
but also by LAD.
##
## Call:
## lm(formula = df$p_move ~ df$census_p_move)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.293007 -0.042909 -0.001684 0.040523 0.305933
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.15348 0.01071 14.33 <2e-16 ***
## df$census_p_move 0.84845 0.02419 35.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07553 on 389 degrees of freedom
## Multiple R-squared: 0.7597, Adjusted R-squared: 0.7591
## F-statistic: 1230 on 1 and 389 DF, p-value: < 2.2e-16
## 2.5 % 97.5 %
## (Intercept) 0.1324184 0.1745389
## df$census_p_move 0.8008799 0.8960149
A linear regression of \(p_{BBC}\) against \(p_{C}\) demonstrates a strong linear relationship between the probability of moving as estimated from census data and the BBC total data set (Adjusted R-squared 0.71). The probability of moving to a different LAD per day is \(\sim\) 10% (7-12% 95% CI) greater in the BBC data set.
The coverage of the BBC mobility data set - with a median of 81 users per LAD (range 2-948) - means for the majority of LADS the raw data is too sparse to estimate movement rates for each strata of the BBC model. To address this, we estimate a generalised linear model (with logit link and random effects at the LAD level) to model the per LAD probability of moving and how this is adjusted for each strata (age or employment status). We estimate the random effects models using the lme4 package.
\[ p_{BBC} \sim~ group + (1 | LAD) \]
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: cbind(move, N - move) ~ group + (1 | LAD)
## Data: age_df
##
## AIC BIC logLik deviance df.resid
## 7135.2 7162.0 -3562.6 7125.2 1559
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.5262 -0.6398 0.0000 0.6584 3.6525
##
## Random effects:
## Groups Name Variance Std.Dev.
## LAD (Intercept) 0.3608 0.6007
## Number of obs: 1564, groups: LAD, 391
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.76628 0.05179 -14.795 < 2e-16 ***
## group18-30 0.87269 0.04713 18.516 < 2e-16 ***
## group30-60 0.97002 0.04351 22.295 < 2e-16 ***
## group60-100 0.25761 0.05209 4.946 7.59e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) g18-30 g30-60
## group18-30 -0.702
## group30-60 -0.760 0.842
## group60-100 -0.636 0.698 0.757
## # A tibble: 3 x 2
## group or
## <chr> <chr>
## 1 18-30 2.39 (2.18,2.62)
## 2 30-60 2.64 (2.42,2.87)
## 3 60-100 1.29 (1.17,1.43)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: cbind(move, N - move) ~ group + (1 | LAD)
## Data: emp_df
##
## AIC BIC logLik deviance df.resid
## 6847.1 6873.8 -3418.5 6837.1 1559
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9221 -0.6283 0.0176 0.6700 3.8583
##
## Random effects:
## Groups Name Variance Std.Dev.
## LAD (Intercept) 0.3569 0.5974
## Number of obs: 1564, groups: LAD, 391
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.764516 0.051693 -14.789 <2e-16 ***
## groupEducation 0.527434 0.055713 9.467 <2e-16 ***
## groupEmployed 1.081079 0.043292 24.972 <2e-16 ***
## groupNEET 0.004211 0.050099 0.084 0.933
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) grpEdc grpEmp
## groupEductn -0.593
## groupEmplyd -0.765 0.718
## groupNEET -0.663 0.612 0.790
## # A tibble: 3 x 2
## group or
## <chr> <chr>
## 1 Education 1.69 (1.52,1.89)
## 2 Employed 2.95 (2.71,3.21)
## 3 NEET 1 (0.91,1.11)
Raw flux matrices comparing the census work-flow data set (A) to stratified BBC flux matrices.
The approximate leave-one-out cross validation (LOO) method uses pareto smoothed importance sampling (PSIS-LOO) to efficiently estimate the predictive accuracy of a model (expected log pointwise predictive density, \(\hat{elpd}\)) and as a basis for model comparison and selection. The estimated shape parameter \(\hat{k}\) can be used to judge the reliability of the estimate of \(\hat{elpd}\) for each data point (or in our case for each LAD corresponding to a row of \(\Omega_{ji}\)). The estimate of \(\hat{elpd}\) is considered reliable (quick convergence) for \(\hat{k} < 0.5\), performance may still be reliable for values of \(\hat{k}\) up to 0.7. Values of \(\hat{k} > 0.7\) suggest that the data points are highly influential to the estimated posterior and potentially introducing bias.
The highland LAD fails PSIS diagnostic checks with a value of \(\hat{k}>0.7\) (although the effect is smaller than for the next (frequency) OD matrix).
Comparison of a model fitted to the full 32 Scottish LADS to a reduced data set with Highlands removed (31 LADS) illustrates the systematic bias introduced on the distance scaling (\(\rho\) parameter). Although posterior distributions are overlapping we consider the size of the effect large enough to motivate removing the highland LAD from inference and for the purposes of model comparison.
The difference between \(\hat{elpd}\) for alternative models fitted to the same data provides a measure of their relative predictive accuracy.
## model elpd model elpd model elpd model
## 1 ERad 0 (0) CDO 0 (0) CDO 0 (0) CDO
## 2 CDO -11500 (1100) CDP -23.2 (7.8) CDP -16.4 (10) CDE
## 3 CDP -12300 (1100) CDE -82.5 (17) IO -239 (51) CDP
## 4 IO -35400 (640) IO -278 (32) ERad -256 (45) IO
## 5 CDE -41400 (850) ERad -302 (26) CDE -351 (37) ERad
## 6 Imp -60900 (880) Imp -392 (31) Imp -465 (44) Imp
## 7 Stoufer -79300 (960) Stoufer -486 (33) Stoufer -644 (44) Stoufer
## elpd
## 1 0 (0)
## 2 -4.18 (3.2)
## 3 -32.6 (9.4)
## 4 -50.9 (8.2)
## 5 -65 (7.2)
## 6 -78.9 (5.1)
## 7 -98.8 (6.3)
## model elpd model elpd model elpd model
## 1 CDO 0 (0) CDO 0 (0) CDO 0 (0) CDP
## 2 ERad -272 (88) CDP -7.2 (3.3) CDP -12.3 (9.5) CDO
## 3 CDE -308 (38) CDE -7.3 (2.9) CDE -16.7 (11) IO
## 4 CDP -2670 (120) IO -44.4 (8.2) IO -44 (17) CDE
## 5 IO -3550 (120) ERad -55.4 (7.6) ERad -57.6 (20) ERad
## 6 Imp -6180 (170) Imp -87.1 (11) Imp -131 (23) Imp
## 7 Stoufer -11200 (240) Stoufer -167 (13) Stoufer -244 (31) Stoufer
## elpd
## 1 0 (0)
## 2 -1.6 (1)
## 3 -4.72 (4)
## 4 -5.64 (2.1)
## 5 -11 (4.7)
## 6 -27 (7.1)
## 7 -45.8 (6.6)
## model elpd model elpd model elpd model
## 1 CDO 0 (0) CDO 0 (0) CDO 0 (0) CDO
## 2 CDE -316 (38) CDE -29.7 (7.7) CDE -26.3 (8.3) CDE
## 3 ERad -388 (94) ERad -85 (19) ERad -37.6 (23) ERad
## 4 CDP -3580 (160) CDP -211 (30) CDP -373 (41) CDP
## 5 IO -4630 (130) IO -513 (46) IO -628 (45) IO
## 6 Imp -7510 (180) Imp -918 (45) Imp -840 (48) Imp
## 7 Stoufer -13500 (280) Stoufer -1620 (71) Stoufer -1930 (88) Stoufer
## elpd model elpd
## 1 0 (0) CDO 0 (0)
## 2 -306 (34) CDE -38.4 (10)
## 3 -427 (82) ERad -94.8 (28)
## 4 -2990 (150) CDP -636 (54)
## 5 -4000 (130) IO -995 (54)
## 6 -6420 (170) Imp -1320 (62)
## 7 -12200 (270) Stoufer -2870 (99)
## model elpd model elpd model elpd model
## 1 CDO 0 (0) CDO 0 (0) CDO 0 (0) CDO
## 2 CDE -29.7 (7.7) CDE -109 (16) CDE -250 (31) CDE
## 3 ERad -85 (19) ERad -180 (39) ERad -326 (73) ERad
## 4 CDP -211 (30) CDP -1020 (80) CDP -2650 (140) CDP
## 5 IO -513 (46) IO -1590 (84) IO -3560 (110) IO
## 6 Imp -918 (45) Imp -2380 (95) Imp -5590 (150) Imp
## 7 Stoufer -1620 (71) Stoufer -5230 (180) Stoufer -10800 (240) Stoufer
## elpd
## 1 0 (0)
## 2 -42.9 (11)
## 3 -104 (23)
## 4 -456 (45)
## 5 -828 (48)
## 6 -1110 (54)
## 7 -2570 (93)
The CDO model (Competing destinations with offset) is favoured for each data set except the census commuting workflow data within England for which the Extended Radiation model is favoured. For comparison between data sets we use the CDO model.